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Abstract 

Common wheat (Triticum aestivum L.) is one of the most important cereals in the world. To improve wheat 
quality and productivity, the genomic sequence of wheat must be determined. The large genome size 
(^17Gb/1 C) and the hexaploid status of wheat have hampered the genome sequencing of wheat. 
However, flow sorting of individual chromosomes has allowed us to purify and separately shotgun-sequence 
a pair of telocentric chromosomes. Here, we describe a result from the survey sequencing of wheat chromo- 
some 6B(914 Mb/1 C) using massiveIyparaI!e!454pyrosequencing. From the4.94 and 5.51 Gb shotgun se- 
quence data from the two chromosome arms of 6BS and 6BL, 235 and 273 Mb sequences were assembled to 
cover ^-^55.6 and 54.9% of the total genomic regions, respectively. Repetitive sequences composed 77 and 
86% of the assembled sequenceson GBSand 6BL, respectively. Within the assembled sequences, we predicted 
a total of 4798 no n- repetitive gene loci with the evidence of expression from the wheat transcriptome data. 
The numbers and chromosomal distribution patterns of the genes for tRNAs and microRNAs in wheat 6B 
were investigated, and the results suggested a significant involvement of DNAtransposon diffusion in the evo- 
lution of these non-protein-codingRNAgenes.Acomparative analysis of the genomic sequences of wheat 6B 
and monocot plants clearly indicated the evolutionary conservation of gene contents. 
Key words: wheat; chromosome 6B; genome sequencing; next-generation sequencing 
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1. Introduction 

Common wheat, also known as bread wheat 
{Triticum aestivum L), is a major staple food crop in 
many parts of the world; therefore, there is a strong 
demand for the genetic improvement of wheat to 
achieve better quality, higher yield, adaptation to 
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various environments, and tolerance to biotic stresses. 
These improvements would contribute significantly to 
human v\/elfare. Highly detailed genomic information 
is an important tool for the genetic improvement of 
wheat, but the full sequencing of the wheat genome 
has been challenging. 

Wheat has a large genome, of ~1 7 Gb, and isallohex- 
aploid, with three homoeologous genomes (2n = 6x = 
42, genome formula AABBDD) that have been suggested 
to originate from Triticum urartu (2n = 2x = 1 4, AA) as a 
donorof the Agenome,/4e0;7ops tousc/?/; (2/7 = 2x= 1 4, 
DD) as a donor of the D genome, a nd Aegilops speltoides 
(2/7 = 2x= 14, SS) or a related species as a possible 
donor of the B genome, although the identity of the 
B genome donor is still debated.' '^ Additionally, ~80- 
90% of the wheat genome is composed of repetitive 
sequences,^ which is a significantly higher percentage 
than Brachypodium (22%), rice (26%), and sorghum 
(54%)."^'^ The large size and polyploidy-related complex- 
ity of the wheat genome have hampered genomic ana- 
lysis, and decoding of the whole genome remains 
challenging, even though next-generation sequencing 
(NGS) technology has recently been applied.^ 

Chromosome sorting by flow cytometry can reduce 
sample complexity and simplify the sequencing of 
complex genomes by dividing these genomes into 
smaller parts.^ Using this method, survey sequences of 
individual chromosome (1 H) or chromosome arms 
(2HS-7HL) in the barley genome have been obtained 
and analysed with NGS technology.^'^ In wheat, the 
sorting of single chromosomes or chromosome arms 
from the cultivar 'Chinese Spring' (CS) and its aneuploid 
lines has enabled the construction of chromosome 
(arm)-specific BAC libraries,^ ° and these BAC libraries 
have served asthe critical resources for the development 
of physical maps and map-based genome sequencing 
by the International Wheat Genome Sequencing 
Consortium (IWGSC; http://www.wheatgenome.org/, 
20 September 201 3, date last accessed). As in barley, 
the sequencing of several wheat chromosomes (1 A, 1 B, 
ID, 3B, 4A, and 5A) or chromosome arms (1 AL, 3AS, 
7BS, and 7DS) has been conducted using NGS.'^"^'' 
These survey sequences from whole-chromosome 
shotgun sequencing are highly informative, bringing 
not only insights into the molecular organization and 
evolution of the wheat genome at an unsurpassed reso- 
lution but also detailed contents of the syntenic genes 
among grass genomes.' '^""^ Recently, whole-genome 
shotgun sequence analysis was performed on the two 
diploid progenitors of wheat, T. urartu andAe. tauschii, 
which provided information that was useful for decoding 
the complex polyploid nature of the wheat genome.^°'^ ' 

Chromosome 6B is the third largest chromosome in 
common wheat, with a total molecular size of 91 4 Mb, 
representing 5.4% of the wheat genome. Chromosome 
6Bconsistsof a 41 5 Mb short arm (6BS) and a 498 Mb 



long arm (6BL).' ° Similar to chromosome 1 B, chromo- 
some 6B can be distinguished from other wheat chro- 
mosomes in the karyotype by the presence of a 
satellite with a secondary constriction on the short 
arm.^^ A translocation of chromosomal segments from 
6BSand 2BS has also been reported. Moreover, a peri- 
centromeric inversion has been observed in the cultivar 
CS.^^'^"^ These struct ural features differentiate the 
chromosome 6Bfromthe homoeologous chromosomes 
6Aand 6D. 

Up to 30 loci, including genes underlying agronomic, 
morphological, and physiological traits, have been gen- 
etically mapped to the wheat chromosome 6B.^^ 
Among these genes, only two gene loci, Nor-B2 (nucle- 
olus organizer region) and Gli-B2 (a/(3-gliadin seed 
storage protein), have been studied well. The Nor-B2 
locus, which is located in the secondary constriction 
of 6BS, contains approximately 5500 copies of the 
rRNA genes,^^ and the G//-62 locus is located in a pos- 
ition distal to the Nor-B2 locus, within ~ 1 0 cM of the 
6BS satellite region.^'' A recent sequencing study of 
the Gil-B2 locus mapped to chromosome 6B revealed 
that 1 1 a/(3-gliadin genes, including one pseudogene, 
were clustered within an ~2 60 kb genomic region. 
In tetraploid wheat, two agronomically important 
genes on chromosome 6B, Gpc-BI (grain protein 
content) and Yr36 (wheat stripe rust resistance), were 
isolated using a map-based cloning strategy.^^'^° 

In this study, we conducted whole-chromosome 
shotgun sequencing using DNA amplified from the 
flow-sorted chromosome arms 6BS and 6BL, which 
were derived from a double-ditelosomic 6B (dDt6B) 
line of CS. The DNA samples of 6BS and 6BL were 
sequenced with a long-read-type NGS (Roche 454 GS- 
FLX Titanium). The assembled sequences were analysed 
to characterize the genomic composition of wheat 
chromosome 6B, including its gene and repetitive se- 
quence contents and its syntenic relationship with 
other grass genomes, and to identify microRNA 
(miRNA) and tRNA precursors. These data will be 
useful for developing new 6B-specific molecular 
markers to construct BAC-based physical maps and 
future molecular breeding with marker-assisted selec- 
tion, which will increase the understanding of the evo- 
lutionary and functional aspectsof the wheat genome. 

2. Materials and methods 

2.1. Plant materials 

Seeds of dDt6B of the hexaploid wheat cultivar CS 
(accession number LPGKU2269) were obtained 
from the National BioResource Project of Japan (http:// 
www.shigen.nig.ac.jp/wheat/komugi/top/top.jsp, 20 
September 2013, date last accessed). The dDt6B line 
was originally developed by Sears,^' and this line 
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contains chromosome 6Basa pairoftelosomes,ofwhich 
one is a short arm (6BS), and the other is a long arm (6BL). 
The karyotype (20" + t"6BS + t"6BL) was confirmed by 
C-banding.^^ 

2.2. Chromosome sorting and DNA amplification 
Liquid suspensions of intact mitotic chromosomes 

were prepared from synchronized root tips. The 
samples were stained with 2 ixg/ml 4',6-diamidino-2- 
phenylindole (DAPI), and the telosomes were sorted 
using a FACSVantage SE flow cytometer (Becton- 
Dickinson, San Jose, USA). The level of purity in the 
sorted fractions was determined by fluorescence in situ 
hybridization (FISH) according to the method described 
by Kubalakova et al.^^ The DNA of the sorted chromo- 
some arms was purified as described by Simkova 
et al^'^ and then amplified by multiple displacement 
amplification using the illustra™ GenomiPhi V2 DNA 
Amplification Kit (GE Healthcare Bio-Sciences Corp., 
Piscataway, NJ, USA). The amplified DNA was purified 
by ethanol precipitation before sequencing. 

2.3. NGS and assembly 

The chromosome arm-specific DNAfrom 6BSand 6BL 
was sequenced independently using the 454 GS-FLX 
Titanium (Roche, CT, USA) at Hokkaido System Science 
Co., Ltd (Sapporo, Hokkaido, Japan) and Takara Bio, Inc. 
(Otsu, Shiga, Japan), respectively. The 454 sequenced 
read data reported here have been deposited in the 
DNA Data Bank of Japan (DDBJ) Sequence Read Archive 
(DRA) and are available under accession number 
(DRA000979). 

The sequence reads from each arm were assembled 
using a GS assembler 2.7 (Roche) with the parameter 
'-large -vt' to remove the vector sequence. The 
assembled contigs were compared with the registered 
sequences of the human genome and the non-redun- 
dant database in the DDBJ/EMBL/GenBank by BLASTN 
with the threshold of £-value < 1 0~^. Contigs with 
human genomic sequence or other non-plant (non- 
Viridiplantae) sequences as the best hit were removed 
from subsequent analysis. 

2.4. Detection of repeats and genes for functional 
RNA species 

Repeat regions were detected with Censor (http:// 
www.girinst.org/censor/index.php, 20 September 201 3, 
date last accessed)^^ with the option '-mode norm'. In 
addition to an existing repeat library, TREP complete an- 
notation (http://wheat.pw.usda.gov/ITMI/Repeats/, 20 
September 201 3, date last accessed), a de novo repeat 
family constructed using RepeatModeler (http://www. 
repeatmasker.org/RepeatModelerhtml, 20 September 
201 3, date last accessed), was used for repeat detection. 
To detect ribosomal DNA (rDNA) regions, a homology 



search against unmasked contigs using BEAT was per- 
formed with the options '-fine -q = rna -out = blast' 
and thresholds of > 9 5% identity and >1 00 bp coverage. 
As queries, four rDNA sequences, 5S (3IZ9), 5.8S (3IZ9), 
1 8S (3IZ7), and 25S (3IZ9), and one spacer region 
between the 2 5S and 1 8S rDNAs (X07 841) were down- 
loaded from DDBJ/EMBL/GenBank (http://www.ddbj. 
nig.ac.jp/, 20 September 201 3, date last accessed). 

The tRNA genes were predicted usingthetRNAscan-SE 
ver 1 .3.1 program. The tRNA genes in Brachypodium 
distachyon, rice, and sorghum were also predicted using 
the same procedure. Any tRNAs that were annotated as 
'possible pseudogenes' were not counted. 

We performed the miRNA prediction following the 
procedure in the previous re port for wheat chromosome 
5A.^ ^ Mature and immature plant miRNAs were down- 
loaded from miRBase (http://www.mirbase.org/, 20 
September 201 3, date last accessed). In total, 4677 
miRNAs from 1 93 different organisms were available. 
First, using mature miRNAs as query sequences, 
BLASTN was performed against the assembled 
sequences with the option £-value of 1 0 and a word 
size of 7. If a mature miRNA showed hits with direct 
and reverse orientation within a contig, then the exist- 
ence of an immature miRNA was postulated. Because 
the BLASTN hits did not always cover an entire query se- 
quence, hit regions were extended in the 5'/3' direction, 
and the number of mismatches between a query and a 
hit region was recalculated using the ClustalW 
program. Two or fewer mismatches in at least one 
strand (direct/ reverse) were accepted. Then, after an ex- 
tension of 1 3 bp at both edges,^^ immature miRNAs 
with lengths of >1 000 bp were discarded because the 
longest immature miRNA in the known miRNAs of the 
plants we used is shorter than 1 000 bp. Finally, the sec- 
ondary structure of the immature miRNA was predicted 
with UNAfold 3.2.^^ The minimal folding free energy 
index (MFEI) was calculated for each structure using 
the following equation: MFEI = AMFE/(G + C)%, where 
the adjusted MFE (AMFE) is the minimal free energy of 
1 00 nucleotides. All sequences with an MFEI of >0.85 
were accepted as miRNAs."^" 

2.5. Gene annotation 

In this study, we determined the expressed loci using 
two methods: FLcDNA/mRNA mapping and ab initio 
gene prediction with EST evidence. These methods 
were developed for the rice genome annotation (The 
Rice Annotation Project)."^ ^ Wheat FLcDNAs were down- 
loaded fromTriFLDB (http://trifldb.psc.riken.jp/index.pl, 
20 September 2013, date last accessed),"^^ and the 
mRNAs and ESTs were retrieved from DDBJ/EMBL/ 
GenBank with the keyword 'Triticum aestivum'. 
FLcDNAs/mRNAs were processed by removing the 
poly-A sequences and repeat masking by Censor using 
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the TREP complete annotation with the option '-mode 
norm'. Then, processed FLcDNAs/mRNAs with lengths 
of > 29 bp after removal of the repeated regions were 
used for further transcript mapping."^^ These sequences 
were mapped on assembled contigs using BLAST+ with 
the parameters '-task blastn -evalue 0.01 -lcase_masl<- 
ing', and with est2genome in the EMBOSS package with 
the parameters '-align -mode both -gappenalty 8 -mis- 
match 6 -minscore 1 0'. Transcripts mapped to a contig 
with >95% identity and >90% cumulative coverage 
were accepted. Mapped regions that were masked 
> 50% by repeat sequences were discarded from further 
analysis. To define the transcribed regions (loci), 
mapped transcripts in exonic regions with at least one 
base of overlap were clustered.'^^ 

Ab initio gene prediction was performed with the 
AUGUSTUS program"^^ trained using the rice build 5 an- 
notation data.'^'^ If the predicted genie regions were rela- 
tively less masked regions (<50%), then the predicted 
genes were classified as 'non-repetitive genes'. As expres- 
sion evidence, we used ESTs mapped by BLASTN with 
>95% identity and >90% coverage on a contig."^^ 
Regions that overlapped by at least one base with 
EST-mapping regions were defined as 'proven predicted 
genes'. 

2.6. Detection oforthologues in cereals 

Rice annotation data were retrieved from RAP-DB 
(http://rapdb.dna.affrc.go.jp/, 20 September 2013, 
date last accessed),'*'^ and the data for B. distacliyon 
and sorghum were downloaded from Phytozome 
(http://www.phytozome.net/, 20 September 2013, 
date last accessed)."^'"^^ Barley high-confidence genes 
were downloaded from MIPS PlantsDB (http://mips. 
helmholtz-muenchen.de/plant/barley/index.jsp, 20 
September 2013, date last accessed).'^^"'^^ First, all 
genes from the four species were mapped on the com- 
bined contigs of 6BS and 6BLbytBLASTN using the par- 
ameter'-e 1 0-5 -UT'.Second, the best pairs betweena 
gene and a contig were selected with the top hit of the 
BLAST search. Third, mapped genes with 1 bp overlap 
were clustered on contigs.'*^ 

3. Results and discussion 

3.1 . Chromosome arm sorting and DNA preparation 

Both of the chromosome arms of wheat chromo- 
some 6B were sorted as telocentric chromosomes 
6BS and 6BL from a double-ditelosomic line (dDt6B) 
by flow cytometry (Fig. 1 ). The use of telosomic stocks 
and flow cytometric sorting permits the dissection of 
the large wheat genome into small and well-defined 
pieces, facilitating analysis and mapping.^ Chromosome 
arms 6BS and 6BL were flow-sorted in batches of 
59,000 and 49,000, respectively, from approximately 



40,000 seeds obtained from 50 dDt6B plants, and the 
average purity in the sorted fractions, as estimated by 
FISH, was 91 .2 and 92.8% for 6BS and 6BL, respectively 
The approximately 7-9% contamination was due to a 
mixture of other chromosomes. Chromosomal DNA was 
extracted and amplified in six independent multiple dis- 
placement amplification reactions. Finally, we obtained 
~20 jjLg of amplified DNA from each chromosome arm 
for use in 454 shotgun sequencing. 

3.2. Stiotgun sequencing and assembly of chromosome 
6B arms 

The sequencing details representing the main metrics 
of the 454 sequencing and the assemblies for 6BS and 
6BL are summarized in Table 1 . Because the estimated 
lengths of 6BS and 6BLwere41 5 and 498 Mb, respect- 
ively,^ ° our total read lengths, 4.94 Gb for 6BS and 
5.51 Gb for 6BL, were equivalent to a sequencing 
depth of ~1 1.9-andl 1 .1 -fold, respectively. Afterthese- 
quence assembly and the removal of short contigs 
(<200 bp), the total lengths of the assembled contigs 
for6BSand 6BLwere 234.8 and 273.2 Mb, comprising 
262 375a nd 173 655 contigs, respectively, which cor- 
responds to 56.6 and 54.9% of the estimated lengths of 
both arms. These total lengths of the assembled contigs 
and the coverages of the estimated chromosome size 
were larger than those reported for other chromosome 



6BL 




Figure 1. The histogram of the relative fluorescence intensity (flow 
karyotype) obtained from the flow cytometric analysis of DAPI- 
stained mitotic meta phase chromosomes isolated from a 
double-ditelosomic line 6B of the common wheat cultivar CS. 
The histogram consists of a chromosome 3B peak, a small 
composite peak I containing chromosomes 1 D, 4D, and 6D, and 
two large composite peaks, II and III, containing the remaining 
1 6 chromosomes. The two additional peaks represent the short- 
arm telosome 6BS and the long-arm telosome 6BL, which can be 
easily discriminated and sorted. The two telosomes can be 
identified by FISH with GAA microsatellite (red) and Afa repeat 
(green) probes (insets). X-axis: relative DAPI fluorescence 
intensity; V-axis: number of particles. 
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arms: 60.9 Mb (20.6%) and 1 1 6.2 Mb (21.8%) for 5AS 
and SAL, respectively^^; and 146.7 Mb (46.3%) and 
239.6Mb (44.5%) for 4AS and 4AU respectively.^*^ 
Although thenumberofshortcontigs(<200 bp) is rela- 
tively large, 43 61 8 and 21 978 for 6BS and 6BL, these 
contigs are equivalent to only 6.41 and 3.23 Mb for 
6BS and 6BL, respectively, which indicates that the 
effect of these contigs on the total assembled length is 
limited. 

3.3. Repeated structure of wheat chromosome 6B 

The wheat genome is composed of abundant repeti- 
tive elements, and > 80% of the genome is occupied by 
repeated sequences.^'^ ^'^ ^'"^^ Using the known TREP 
library and the de novo repeat library from this study, 
we determined that 76.6% of the 6BS assembly and 
85.5% of the 6BL assembly correspond to repeat ele- 
ments. More than 1 3% of the repetitive regions in the 
assemblies of both chromosome arms were masked 

Table 1. The summary of 454 sequencing reactions and assemblies 
for the short and long arms of chromosome 6B 



6BS 6BL 



Total reads 


1 2 


873 283 


1 2 082 1 50 


Total bases (bp) 


4 941 


1 74 940 


5 507 636 827 


Average read length (bp) 




383.83 


456.00 


Average quality value 




27.6 


27.3 


Number of contigs 




262 375 


1 73 655 


Total bases (bp) 


234 772 755 


273 1 93 549 


N50 (bp) 




1 1 07 


2675 


Min (bp) 




200 


200 


Max (bp) 




24 902 


38 754 


Mean (bp) 




894.8 


1 573.2 


Average depth (reads/contig) 




9.05 


10.1 


Median depth (reads/contig) 




5.2 


6.7 



only in the denovo repeat library. These results indicate 
that chromosome 6B contains novel, unannotated 
repeat sequences, providing important insight into 
the genomic structure of wheat chromosomes for 
future reference genomic sequencing by the IWGSC. 

We used the TREP library to further classify the 
contigs that matched known transposable elements 
(TEs) into TE families (or subfamilies) according to the 
categories of a unified classification."^^ We excluded 
the repeated contigs mtUedenovo libra ryfromthisclas- 
sification because these contigs were not well anno- 
tated and therefore could not be integrated under the 
same criteria as the contigs in the TREP library. We did 
not observe any bias in the distribution of the repetitive 
elements along each arm of 6B (Fig. 2). The LTR/Gypsy 
family was most frequent in chromosome 6B, followed 
by the LTR/Copia and DNAtransposon CACTA families. 
This distribution tendency observed in 6B is essentially 
conserved in other sequenced wheat chromosomes 
(Fig. 2). 

Previous survey sequencing of wheat chromosome 
5A revealed that mostof the contigs with high coverage 
rates consisted of repeated sequences. However, 
highly masked contigs do not always have higher read 
depths, as shown in Fig. 3, and contigs with lower read 
depths were sometimes repetitive. We assume that 
repeat sequences are not always the cause of genome 
degeneration in the genomic assembly, and more 
precise analysis is necessary for an accurate conclusion. 
We used FISH to obtain insight into the distribution of 
TEs along chromosome 6B. We selected 24 families of 
TEs representing majorcomponents of thetransposons 
found in our assemblies (Fig. 2). We amplified the 
unique regions of these TEs using PGR (Supple- 
mentary Table SI); 2 0 samples yielded products of 
the expected size. In two cases (Thalos and Icarus), the 
fragment size was smaller than expected, and the 
primers for Jorge and Athos did not produce discrete 
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6BS 6BL 3BS 3BL 5AS SAL 



unknown, unknown, unknown 
Retrotransposon, SINE, unknown 
Retrotransposon, LTR, unknown 
Retrotransposon, LTR, Gypsy 
' Retrotransposon, LTR, Copia 
Retrotransposon, LINE, unknown 
■ ONA transposon, unknown, unknown 
' DNA transposon, TIR, unknown 
' DNA transposon, TIR, Mutator 
• DNAtransposon, TIR, Mariner 
I DNA transposon, TIR, HAT 
< DNA transposon, TIR, Harbinger 
I DNA transposon, TIR, CACTA 
' DNA transposon, Helitron, Helitron 



Figure 2. The distribution of repetitive elements in wheat chromosomes 3B, 5A,and 6B. Repeat detection in chromosomes 3 B' ' and 5A'^ was 
conducted by the same procedure used for chromosome 6B. Only the TREP repeat data were used to categorize repetitive elements. 
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0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

Masl<ing ratio 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

Masl<ing ratio 




|> 100 

1 100 >= X > 50 
150 >=x>20 
120 >=x>10 
I10>=x>5 
I5>=x> = 1 



Figure 3. The relationship between the masl<ing ratio and the read depth of the contigs located in the short arm (A) and long arm (B) of 
chromosome 68. 



bands by agarose gel electrophoresis; there were smear 
patterns for Jorge and multiple bands for Athos. FISH 
with a probe for the AG12 microsatellite was used to 
identify chromosome 6B, which has the third strongest 
signal in the interstitial region of the short arm 
(Supplementary Fig.SI ). Chromosome 6Bcan bediscri- 
minated from chromosome 3 B by the presence of satel- 
lite DNA on the short arm. The distribution patterns 
of transposons were represented by FISH signals with 
each probe, as shown in Fig. 4. All of the tested probes 
displayed dispersed localization patterns along the 
length of chromosome 6B. Notably,the NOR (nucleolus 
organizer region) and centromere regions lacked trans- 
poson FISH signals. We did not observe striking differ- 
ences in the distribution patterns of the transposons, 
and these patterns were not dependent on transposon 
family, class,ortype. Notably,theTE sequences were not 
restricted to the C-band/N-band-positive pericentro- 
meric heterochromatin. We found that the distribution 
of some transposons were not ubiquitous in the hexa- 
ploid wheat complement (Supplementary Table SI ). 

3.4. Detection of transcribed regions 

More than 80,000 FLcDNAs/mRNAs and millions of 
ESTs are available in the DDBJ/EMBL/GenBank and 
TriFLDB public databases, facilitating wheat transcrip- 
tome analysis."^^ To detect the transcribed regions 
and to annotate the gene structures on chromosome 
6B, we mapped these transcribed sequences to our 
assembled contigs. Using the two methods (the tran- 
script mapping and ab initio gene prediction) described 
extensively in Materials and methods, we identified 
2032 and 2 766 loci on 6BS and 6BL, respectively, as 
genie regions supported by the data from the wheat 
transcriptome (Table 2), comparable with the results 
for chromosomes 3B and 5A.^ ^'^ ^ 

We analysed whether our data contained transcribed 
regions of the genes involved in stress response, patho- 
gen resistance, and flowering and the genes encoding 



seed storage proteins and some enzymes reported to 
be located on wheat chromosome 6B. The a-gliadin 
gene (acc. no. JX141494), the stripe rust resistance 
gene Yr36 (EU835199), and the grain protein 
content gene Gpc-BI (DQ869673) were mapped to 
contigs from 6BS, and the a-amylase gene (Ml 6991 ) 
and the genes for three low-temperature-responsive 
dehydrins, Wcsl20 (M93342), Wcs66 (L2751 6), and 
Wcor410 (L291 52), were mapped to 6BL contigs. 
These results for the chromosomal assignment of 
known genes were in accordance with previous 
studies.^^'^°'^' The three homoeologous sequences 
containing the flowering time genes, TaHdl -1 ,TaHd1 - 
2, and TaHdl -3 had been isolated from the long arm 
of chromosomes 6A, 6B, and 6D, respectively. Our 
analysis reconfirmed that the sequence of TaHd1-2 
(AB094488) was mapped to a contig from 6BL with 
1 00% identity, which is in good agreement with the 
previous report. We also examined whether the genes 
isolated from chromosomes 6A or 6D but not from 6B 
were homoeologous on chromosome 6B. The gene 
involved in vernalization, TmVIL2 (vernalization in- 
sensitive 3-like 2) (DQ88691 7), has been isolated 
from the diploid wheat Triticum monococcum and was 
mapped to the short arm of chromosome 6A'".^^ We 
found a contig containing the whole gene sequence of 
ViL2 on 6BS (Supplementary Fig. S2), which indicates 
that the homoeologous gene copy of TmVIL2 is con- 
served on chromosome 6B in hexaploid wheat. These 
mapping data support our survey sequences of wheat 
chromosome 6B, which contain previously reported 
genie regions and are useful for mining the genes 
located on chromosome 6B. 



3.5. Identification of the genes for functional 
non-protein-coding RNAs 
Theshort arm of wheat chromosome 6B is character- 
ized by the presence of satellite and a secondary con- 
striction, NOR (nucleolus organizer region), as is 
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Figure 4. The distribution patterns of tlieTEs on cliromosome 6 B. Chromosome 6B was identified by the red AG, 2 signal, and the distributions 
of the transposonsare represented by the green signals. The chromosomes are arranged with the short arms on top.The transposon probes 
displayed uniform labelling of chromosome 68, with the exception of the satellite and centromeric regions. 



Table 2. The statistics of the transcriptomes in the short and long arms of chromosome 68 





N u m ber of tra nscri pts 


Number of trimmed transcripts 


68S 


6BL 


FLcDNA/mRNAs 


84 1 64 


83 425 


2703 (2238)* 


4754 (3494)* 


50% > masking 






1 762 (1 343)* 


301 6 (1 851)* 


EST 


1 286 1 73 


1 281 733 


48 695 


53 31 5 


Predicted genes 






4967 


561 3 


Overlapping with ESTs 






860 


1 377 


Total loci with evidence of expression 






2032 


2766 



chromosome 1 B,^^ which features a rDNA locus that 
contains approximately 5500 rRNA genes.^^ To 
explore the structure of the rDNA region, we searched 
for contigs with homology to sequences corresponding 
to the 5S, 5.8S, 1 8S, and 2 5S rDNAs and a spacer 



sequence between 1 8S and 25S. From the 6BS 
assembled sequences, we found only eight contigs 
that showed homology to any sequence with >9 5% 
identity and > 100 bp alignment. However, seven of 
these contigs exhibited extremely high read depths 
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Table 3. Contigs containing rDNA on the sPiort arm of cliromosome 6B 



Co n t i g 


Qu6ry 
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KcdU Llc[JLll 


Contig254561 
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98 


231 


3450 


228 


202.7 


Contigl 72978 


1 8S 


99 


505 


1 869 


303 


1 88.7 


Contigl 1 2653 


5.8S 


96 


722 


223 


1 64 


1 21.1 


Contigl 1 3562 


Spacer 


100 


71 8 


4642 


718 


203.8 


Contig254561 


Spacer 


98 


231 


4642 


231 


202.7 


Contigl 77561 


Spacer 


99 


489 


4642 


489 


1 40.2 


Contig225039 


Spacer 


100 
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4642 
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Contig53095 


Spacer 
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1 1 36 
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Figure 5. The prediction of tRNA species on chromosome 6B. The number of tRNAs detected by tRNAscan-SE ver. 1.3.1 was counted. 
Pseudogenes were excluded, and tRNAs covered by repetitive elements were included. 



(73.7-203.8) (Table 3). Because the average read 
depth observed for 6BS was 9.05, the high depth rates 
indicate that these contigs represent sequences \N\th 
high copy numbers. This result demonstrates that 
rDNA regions, including spacer sequences, were 
assembled in a few contigs because of high sequence 
similarity under functional constraint. In general, 
spacer sequences are more diverse than rDNA 
sequences because of a low functional constraint. In 
our study, the spacer sequences contained four repeat 
families that can be a source of additional diversifica- 
tion.^"* Nevertheless, the rDNA regions consistently 
have high read depths in our survey sequencing, 
which suggests that concerted evolution occurred in 
this region, although the possibility that the rDNA 
regions were duplicated recently cannot be excluded. 

We detected 2 1 3 and 1 67 predicted tRNA genes in 
6BS and 6BL, respectively. In both chromosome arms, 
the tRNA'"^^ gene was the most abundant, followed by 
the genes for tRNA*^*^' (Fig. 5). Such a skewed distribu- 
tion of tRNA genes was not observed in any of the syn- 
tenic chromosomes to wheat chromosome 6 in other 



grass species, e.g. chromosome 2 of Oryza satiua, 
chromosome 3 of 6. distachyon, and chromosome 4 of 
Sorghum bicolor. We hypothesized that the expansion 
of a particular tRNA gene could be caused by repetitive 
elements, that is, if one tRNA gene is located in a repeti- 
tive region, the copy number of the gene increases dra- 
matically along the propagation of the repetitive 
sequence. As expected, 83 of 1 31 tRNA'"^^ genes on 
chromosome 6B were located in an LTR retrotrans- 
poson, Gypsy, or de novo repeats. Although the details 
of the de novo repeats containing tRNA'"^'' are not clear, 
tRNA expansion could occur through repetitive ele- 
ments. Because 86.7% of tRNAs located in repeat 
regions were tRNA'"^^ or tRNA'^'^\ the skewed distribu- 
tion of the tRNA genes on wheat chromosome 6B can 
be explained by the expansion of repetitive elements 
containing specific tRNA genes. 

miRNAs are a class of small RNAs that mediate gene 
silencing at the post-transcriptional level. Only 42 
wheat miRNAs are stored in miRBase as of release 
1 9,^^ which is significantly fewer than the number for 
other grass plants (rice: 708, maize: 321, sorghum: 
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Table 4. Putative miRNA species identified in the survey sequences 
of the short and long arms of chromosome 6B 





6BS 


6BL 


Both 


Locus 


1 381 


1 525 




Wheat miRNA evidence 


825 


91 3 




Non-wheat miRNA evidence 


556 


612 




Number of hit query 


205 


204 


350 


Identical locus to query miRNA 


146 


1 75 




Numberof miRNA for identical locus 
(wheat) 


10 


1 2 


1 3 


Numberof miRNA for identical locus 
(non-wheat) 


4 


8 


8 



242, and Brachypodium: 136). These observations 
suggestthat more miRNA genes remain to be identified 
in tfie wheat genome. To find known/novel miRNA 
genes in our assembly, we conducted a homology 
search in miRBase using known plant miRNAs. A total 
of 2906 miRNAs (1381 loci on 6BS and 1 525 loci on 
6BL) were predicted using 3 50 mature plant miRNAs 
in miRBase as queries (Table 4). Some miRNAs are 
located in repeat regions,' '^''^ as has been observed 
for tRNA genes. Consistent with these previous 
reports, all but 26 of the predicted miRNA genes are 
located in repeat-masked regions. Especially, 1805 
miRNA genes were located in a DNA transposon. 
Mariner, and 766 genes were located in CACTA 
repeats. Even though the LTR retrotransposons Gypsy 
and Copia were distributed most widely on both arms, 
only 63 miRNA genes were located in these transpo- 
sons. These results indicate that miRNA genes propa- 
gate in the wheat genome with the diffusion of 
specific transposons, although which of the predicted 
miRNA genes are transcribed into mature miRNAs has 
not been determined. 



3.6. Comparative analysis ofsyntenic chromosomes to 
wlieat 6B among monocot species 
The ba rley genome sequence was recently reported,"^^ 
and the annotation data for this species can be used as 
a possible gene set as the closest relative of wheat. Of 
the 26 1 59 high-confidence genes in barley, 2573 loci 
are located on chromosome 6H. Mapping barley 
genes to our wheat 6B assembly revealed that 2399 
genes had significant hits on 2 070 loci of wheat 6B 
(£-value < 1 0~^). Chromosome arm information is 
availablefor423 and 31 3 genes located in chromosome 
6HSand 6HL, respectively; therefore, we compared these 
barley genes with the arm information of our wheat 
assembles. We found that 380 of 423 6HS genes 
(89.8%) mapped to chromosome 6BS, and 246 of 
31 3 6HL genes (78.6%) mapped to chromosome 6BL. 
Based on these data, we concluded that our assemblies 



O. sativa 




B. distachyon S. bicolor 



Figure 6. The distributions of the genes found on chromosome 6B 
with significant similarity to O. sativa, B. distachyon, and S. 
bicolor. The numbers in parentheses represent loci on which 
genes from syntenic chromosomes were mapped. 

provide good coverage of the transcribed regions, which 
were supported by the synteny with barley chromosome 
6H. However, chromosome arm information is still 
missing for more than two-thirds of the genes on 6H; 
therefore, we cannot analyse the syntenic relationship 
between wheat 6B and barley 6H more precisely. 

We also compared the wheat 6B assemblies with the 
annotation data for monocot plant species such as 
O. sativa, B. distacliyon,andS. /^/co/orto identify homolo- 
gous regions. Our search indicated that 8783 loci 
from the three monocot plants were possibly homolo- 
gous to wheat 6B contigs, and 3880 of which were 
found in all four genomes (Fig. 6). Wheat homoeolo- 
gous group 6 chromosomes have a synteny with 
chromosome 2 of O. sativa (Os02), chromosome 3 of 
B. distactiyon (Bradi3), and chromosome 4 of S. bicolor 
(Sb04).^'^'^^ Our results demonstrated that 3 772 loci 
were syntenic to at least one of the syntenic chromo- 
somes of the three monocot plants. The mapping 
ratio of syntenic genes (40.2-59.7% between our 
annotated loci and those of syntenic chromosomes in 
the other three species) was comparable with the 
total coverage of our assembly for wheat chromosome 
6 (55.6%). 

To verify the reliability of these loci, we assessed the 
wheat transcriptome evidence, such as wheat FLcDNA/ 
mRNAs and predicted genes with EST evidence. We 
found that 5 7.4% of the syntenic regions had transcrip- 
tome evidence, which was significantly higher than the 
value for non-syntenic regions (32.7%). In particular, 
the regions syntenic to all three monocot species were 
highly supported by the transcriptome data (79.9%). 
These results confirmed that wheat chromosome 6 has 
conserved synteny with the chromosomes of other 
grass species at the sequence level. 
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4. Conclusions 

Here, we have provided the whole-chromosome 
shotgun sequence of wheat chromosome 6B, which 
provides an overview of the sequence features of this 
chromosome, including rDNA regions, a characteristic 
structure of wheat 6B, and we present new information 
abouttheTEs,expressed genesthataresyntenic in other 
phylogenetically related species, and non-protein- 
codingtRNAand miRNAgenes. Weare now conducting 
on the reference genome sequencing of chromosome 
6B using the MTP BACs within the framework of the 
IWGSC. However, filling the sequence gaps and evaluat- 
ing the quality of the assembly data using only one data 
set may be difficult. This survey sequence provides valu- 
able information for completing the genome assembly 
as well as the mate-pair sequencing, which is also now 
underway. Furthermore, the survey sequence informa- 
tion in this study will be directly used to identify 6B 
genes that can be exploited to control agronomically 
important traits and to construct DNA markers for 
these traits. The assembled contigs will be available 
for browsing on our web site (KomugiGSP; http:// 
komugigsp.dna.affrc.go.jp/index.html, 20 September 
2013, date last accessed). 
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